From ~2.5 million spectra, we aim to detect Lyman-break galaxies (LBGs) by comparing against a known template. In particular, good candidates should:
The first requirement is easier than the second, since we are provided a high-quality LBG template to compare with. If a similarly high-quality template were to be provided for quasars, it should be possible to substantially improve these results.
The main consideration in designing the method was robustness. Every step in the process should be as robust as possible against:
Carefully selected smoothing methods were effective at reducing anticipated noise, convolution matching of absorption troughs was effective at correcting for horizontal shifts, and regressing on quantiles was effective at correcting vertical shifting/scaling.
More details are provided below.
The process was carefully constructed and tuned with the help of several R Shiny debugging apps I made (see repo for details). Below is a more in-depth overview:
Steps 5-6 above are kind of hacky. If we had a high-quality quasar template to match against, we may be able to replace it with more elegant code that also achieves higher specificity (this is only a conjecture).
Though the method seems complex, it’s very computationally efficient. Each target spectrum takes around \(0.13\,\text{s}\) to run and requires \(<1\,\text{GB}\) (which is recycled for the next iteration). Most of this time (72%) and memory (58%) is actually spent on simply reading in the target spectrum file.
The 2.459 million spectra in the dataset were analyzed using CHTC. Using 2459 jobs (1000 spectra each), with each job running on 1 core with 1GB of memory and even less disk, we were able to reduce runtime of all jobs from \((2.459\!\times\!10^{6})\cdot(0.13\,\text{s})\approx89\,\text{hours}\) to about 3 hours real-time (including time spent in the queue waiting for an open execute node).
A table of top 500 results ranked by \(C\) can be viewed here.
Plots of some of the best matches are shown below. Most info in the title/label are for my debugging purposes. Notably, the offset index (red-shift correction); and \(A_{\text{peak}}\), \(K_{\text{trans}}\), and \(C\) are shown in the title. The template was shifted down from the target spectra for ease of comparison.
For each spectrum, the composite score is also shown on the side for convenience. Initially, I was going to give a (likely poor) guess for each one what type of astronomical body it may be, but I ultimately decided against it since I’m not an astronomer and would have very little confidence in my guesses 🙃 (there seem to be quite a few potential quasars in here with broad emission bands CIV and CIII).